ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites

نویسندگان

  • Stefan H Lelieveld
  • Judith Schütte
  • Maurits J J Dijkstra
  • Punto Bawono
  • Sarah J Kinston
  • Berthold Göttgens
  • Jaap Heringa
  • Nicola Bonzanni
چکیده

Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Species Selection for Phylogeny-Based Motif Detection

Detecting conserved regions in multiple species alignment is crucial when modeling orthologous entities. However, in phylogenetic analysis of entities other than genes, for instance transcription factor binding sites (TFBS), this proves to be non-trivial due to the high functional turnover and incomplete orthology even within close species, such as Drosophila clade. Having more species does not...

متن کامل

Reliable prediction of transcription factor binding sites by phylogenetic verification.

We present a statistical methodology that largely improves the accuracy in computational predictions of transcription factor (TF) binding sites in eukaryote genomes. This method models the cross-species conservation of binding sites without relying on accurate sequence alignment. It can be coupled with any motif-finding algorithm that searches for overrepresented sequence motifs in individual s...

متن کامل

Factors influencing the identification of transcription factor binding sites by cross-species comparison.

As the number of sequenced genomes has grown, the questions of which species are most useful and how many genomes are sufficient for comparison have become increasingly important for comparative genomics studies. We have systematically addressed these questions with respect to phylogenetic footprinting of transcription factor (TF) binding sites in the gamma-proteobacteria, and have evaluated th...

متن کامل

CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting.

Prediction of transcription-factor target sites in promoters remains difficult due to the short length and degeneracy of the target sequences. Although the use of orthologous sequences and phylogenetic footprinting approaches may help in the recognition of conserved and potentially functional sequences, correct alignment of the short transcription-factor binding sites can be problematic for est...

متن کامل

PromoterSweep: a tool for identification of transcription factor binding sites

There are many tools available for the prediction of potential promoter regions and the transcription factor binding sites (TFBS) harboured by them. Unfortunately, these tools cannot really avoid the prediction of vast amounts of false positives, the greatest problem in promoter analysis. The combination of different methods and algorithms has shown an improvement in prediction accuracy for sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2016